Skip to content

Conversation

@DanielDango
Copy link
Collaborator

@DanielDango DanielDango commented Nov 28, 2025

This PR aims to introduce prompt optimizers developed during my bachelors thesis into the LiSSA framework.
They can be used as a standalone interaction with the framework, as well as chained with regular evaluation pipeline runs.

The ProTeGi optimizer is suggested in #57

This commit introduces a prompt optimization system that enables
automatic tuning of classification prompts to improve trace link recovery performance.

Core Components:
- PromptOptimizer interface and IterativeOptimizer implementation for iterative naive
  prompt refinement
- Optimization class as the main orchestrator for running optimization workflows
- OptimizeCommand CLI integration for executing optimizations from command line

Evaluation Framework:
- Evaluator base class with strategy pattern for different evaluation approaches
- BruteForceEvaluator for exhaustive prompt candidate evaluation
- UpperConfidenceBoundBanditEvaluator for efficient exploration-exploitation tradeoff
  using UCB1 algorithm to reduce computational costs
- MockEvaluator for testing purposes

Metrics System:
- Metric interface with PointwiseMetric and GlobalMetric implementations
- FBetaMetric for computing precision, recall, and F-beta scores
- Scorer interface with BinaryScorer for trace link classification scoring
- Reductor interface with MeanReductor for aggregating pointwise metrics
- Factory classes (MetricFactory, ScorerFactory, ReductorFactory) for instantiation

Infrastructure Updates:
- Extended ModuleConfiguration with prompt management capabilities
- Added OptimizerConfiguration for optimizer-specific settings
- Enhanced CacheManager with scorer-specific caching via ScorerCacheKey
- Added ChatLanguageModelUtils for LLM interaction helpers
- Updated Statistics class with methods for computing evaluation metrics

Classifier Modifications:
- Used setClassificationPrompt() and getClassificationPromptKey() to Classifier
  abstract class to support dynamic prompt updates
- Replaced cache parameters with map

Testing:
- Added PromptOptimizationUtilsTest with Markdown escaping tests
- Extended Requirement2RequirementE2ETest with optimizer test cases
- Added test resources with expected outputs and cache files for iterative
  and simple optimizer configurations
- Updated ArchitectureTest to ensure correct usage of dangerous overwrite methods

Configuration:
- Added example-configs/optimizer-config.json demonstrating optimizer setup
@DanielDango DanielDango marked this pull request as ready for review November 28, 2025 11:15
@DanielDango DanielDango requested a review from dfuchss as a code owner November 28, 2025 11:15
Copilot AI review requested due to automatic review settings November 28, 2025 11:15
@DanielDango
Copy link
Collaborator Author

@dfuchss, your review is not yet required. I intended to request an initial review from @copilot. I will request your review when the ProTeGi (Gradient) and naive optimizer with feedback of misclassified TLs (Feedback) are added.

Copy link

Copilot AI commented Nov 28, 2025

@DanielDango I've opened a new pull request, #45, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive prompt optimization framework to the LiSSA trace link analysis system. The framework enables iterative improvement of classification prompts through automated optimization strategies, integrating metrics, evaluators, and optimizer implementations.

Key Changes:

  • Adds prompt optimizer framework with iterative optimization support
  • Introduces metric system (pointwise and global metrics) for evaluating prompt performance
  • Implements evaluator strategies (brute-force and Upper Confidence Bound bandit algorithms)
  • Extends configuration system to support optimization workflows

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 28 comments.

Show a summary per file
File Description
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/* Core optimizer interfaces and implementations (IterativeOptimizer, MockOptimizer, OptimizerFactory, utilities)
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/* Metric system with scorers (BinaryScorer), reductors (MeanReductor), and metric implementations (PointwiseMetric, GlobalMetric, FBetaMetric)
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/* Evaluator base class with BruteForceEvaluator and UpperConfidenceBoundBanditEvaluator implementations
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Optimization.java New optimization pipeline that extends evaluation with prompt optimization step
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Evaluation.java Extended to support prompt overwriting for optimization workflow
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/configuration/* OptimizerConfiguration record and ModuleConfiguration setArgument methods
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/* Added getCacheParameters() method and changed copyOf() visibility to public
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/* New ScorerCacheKey and CacheManager enhancements for optimizer caching
src/main/java/edu/kit/kastel/sdq/lissa/cli/command/OptimizeCommand.java CLI command for running optimization and optional evaluation pipelines
src/test/java/edu/kit/kastel/sdq/lissa/ratlr/* Test files including PromptOptimizationUtilsTest and E2E optimizer tests
src/test/resources/warc/* Test configurations and expected outputs for optimizer validation
example-configs/optimizer-config.json Example optimizer configuration file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 54 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

DanielDango and others added 14 commits November 28, 2025 13:16
 - refines iterative optimizer with feedback on misclassified trace links (false positive or false negative)
 - add sample strategy to configure which TLs will be used as feedback
 - add SLF4J Simple Logger configuration for improved logging in tests
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… feature/add-prompt-optimization-module

# Conflicts:
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
Copy link

Copilot AI commented Dec 23, 2025

@DanielDango I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you.

…tive optimization prompt

 - update e2e test cases
 - add missing spaces in optimization prompt
 - revert "revert reduced target store deduplication"

This reverts commit 63df5fb.
@DanielDango DanielDango force-pushed the feature/add-prompt-optimization-module branch from 6bef3cb to a14496b Compare January 22, 2026 11:44
@DanielDango DanielDango marked this pull request as ready for review January 22, 2026 11:46
Copilot AI review requested due to automatic review settings January 22, 2026 11:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 86 out of 89 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +150
* @param config The configuration object
* @throws IOException If there are issues setting up the cache
*/
public Evaluation(Configuration config) throws IOException {
this.configuration = config;
// TODO maybe dont?
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "TODO maybe dont?" regarding setting configFile to null. This suggests uncertainty about the design. Null config files could cause NullPointerExceptions downstream. Either handle this case properly or document why null is acceptable here.

Suggested change
* @param config The configuration object
* @throws IOException If there are issues setting up the cache
*/
public Evaluation(Configuration config) throws IOException {
this.configuration = config;
// TODO maybe dont?
* <p>
* Note: When using this constructor there is no associated configuration file on disk.
* Consequently, {@link #configFile} is set to {@code null} by design and any code accessing
* it must first check for {@code null}.
* </p>
*
* @param config The configuration object
* @throws IOException If there are issues setting up the cache
*/
public Evaluation(Configuration config) throws IOException {
this.configuration = config;
// No configuration file is associated with this instance; configFile remains null by design.

Copilot uses AI. Check for mistakes.
@dfuchss
Copy link
Member

dfuchss commented Jan 22, 2026

@DanielDango I think now, the challenge is to merge after changes from #48. Afterward, I can review this part of the PR :)

…optimization-module

# Conflicts:
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheKey.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheManager.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheParameter.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/classifier/ClassifierCacheKey.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/classifier/ClassifierCacheParameter.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/embedding/EmbeddingCacheKey.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/embedding/EmbeddingCacheParameter.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ReasoningClassifier.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/SimpleClassifier.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/embeddingcreator/CachedEmbeddingCreator.java
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/preprocessor/SummarizePreprocessor.java
#	src/test/java/edu/kit/kastel/sdq/lissa/ratlr/ArchitectureTest.java
@DanielDango
Copy link
Collaborator Author

@dfuchss the cache update has been merged

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 77 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Override
public String getClassificationPromptKey() {
public static String getClassificationPromptKey() {
return PROMPT_TEMPLATE_KEY;
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method getClassificationPromptKey() has been changed from an instance method to a static method in SimpleClassifier and ReasoningClassifier. However, the method is still being removed from the abstract Classifier class rather than being changed to static there. This creates an inconsistency where individual classifier implementations have static methods but there's no common static interface. Consider whether a static factory method in the base Classifier class would be more appropriate, as shown in the new createClassificationPromptKey() method.

Suggested change
return PROMPT_TEMPLATE_KEY;
return Classifier.createClassificationPromptKey(PROMPT_TEMPLATE_KEY);

Copilot uses AI. Check for mistakes.
Comment on lines 188 to 193
// TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config
// for serialization. Maybe you can utilize ModuleConfiguration.with() instead?
if (!prompt.isEmpty()) {
configuration
.classifier()
.setArgument(Classifier.createClassificationPromptKey(configuration.classifier()), prompt);
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates this is a known workaround that should be addressed. The hack modifies the configuration's classifier argument directly to inject the optimized prompt. Consider using ModuleConfiguration.with() or a cleaner approach to avoid mutating configuration state after initialization, which could lead to inconsistencies and make the code harder to maintain.

Suggested change
// TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config
// for serialization. Maybe you can utilize ModuleConfiguration.with() instead?
if (!prompt.isEmpty()) {
configuration
.classifier()
.setArgument(Classifier.createClassificationPromptKey(configuration.classifier()), prompt);
// Update the classifier configuration immutably so the optimized prompt is stored for serialization
if (!prompt.isEmpty()) {
var promptKey = Classifier.createClassificationPromptKey(configuration.classifier());
var updatedClassifierConfig = configuration.classifier().withArgument(promptKey, prompt);
configuration = configuration.withClassifier(updatedClassifierConfig);

Copilot uses AI. Check for mistakes.
Copy link
Member

@dfuchss dfuchss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Nice work :)

Comment on lines +79 to +94
for (Path optimizationConfig : configsToOptimize) {
LOGGER.info("Invoking the optimization pipeline with '{}'", optimizationConfig);
String optimizedPrompt = "";
try {
var optimization = new Optimization(optimizationConfig);
optimizedPrompt = optimization.run();
} catch (IOException e) {
LOGGER.warn(
"Optimization configuration '{}' threw an exception: {} \n Maybe the file does not exist?",
optimizationConfig,
e.getMessage());
}
for (Path evaluationConfig : configsToEvaluate) {
runEvaluation(evaluationConfig, optimizedPrompt);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to own method to match the description in the JavaDoc :)

Comment on lines 23 to 38
public final class ClassifierCacheKey implements CacheKey {
private final String model;
private final int seed;
private final double temperature;
private final LargeLanguageModelCacheMode mode;
private final String content;

@JsonIgnore
private final String localKey;
public record ClassifierCacheKey(
String model,
int seed,
double temperature,
LargeLanguageModelCacheMode mode,
String content,
@JsonIgnore String localKey)
implements CacheKey {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not revert the change of the classifiercacheKey

* @return A new classifier cache key
*/
static ClassifierCacheKey of(ClassifierCacheParameter cacheParameter, String content) {
public static ClassifierCacheKey of(ClassifierCacheParameter cacheParameter, String content) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shall not be public

<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<version>${junit.version}</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That should be managed by the parent.

<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<configuration combine.self="append">
<lineEndings>UNIX</lineEndings>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I will move that to the parent.

Comment on lines 188 to 189
// TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config
// for serialization. Maybe you can utilize ModuleConfiguration.with() instead?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, resove todo

/**
* Separator used in configuration names.
*/
public static final String CONFIG_NAME_SEPARATOR = "_";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt that defined publicly in configuration ?

Comment on lines +8 to +10
public static final String FIRST_SAMPLER = "first";
public static final String ORDERED_SAMPLER = "ordered";
public static final String SHUFFLED_SAMPLER = "shuffled";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings used elswhere?

Copilot AI review requested due to automatic review settings January 22, 2026 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 72 out of 75 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/GlobalMetric.java:1

  • The logger variable name 'logger' is inconsistent with the naming convention used in other classes in the codebase, which use 'LOGGER' in uppercase. For consistency, this should be renamed to 'LOGGER'.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public Evaluation(Configuration config) throws IOException {
this.configuration = config;
// TODO maybe dont?
this.configFile = null;
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment on line 152 suggests uncertainty about setting configFile to null. This should either be resolved with a proper implementation or the TODO should be removed if null is the intended behavior. Consider clarifying whether configFile is optional for evaluation instances created from Configuration objects.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants